-
Notifications
You must be signed in to change notification settings - Fork 275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimized Privilege Evaluation #4380
base: main
Are you sure you want to change the base?
Optimized Privilege Evaluation #4380
Conversation
Please have a look at the approaches. I would be very interested in your opinions. As mentioned above, the implementation is not complete yet. The code contains a couple of TODO comments to indicate what work needs to be done. I would also like to discuss whether a couple of things would be really necessary or whether there might be a chance to simplify the implementation by abolishing them. These are:
|
I have also started to work on the micro benchmarks as discussed in #3903. The generally accepted standard for micro benchmarks in Java is the JMH framework. However, this is licensed as GPL v2 with classpath exception: https://github.com/openjdk/jmh/blob/master/LICENSE Is the inclusion of a dependency with such a license acceptable in OpenSearch? |
src/main/java/org/opensearch/security/privileges/CheckTable.java
Outdated
Show resolved
Hide resolved
@cwperks Can you look into this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the great work here @nibix, looking forward to seeing the micro-benchmark numbers and seeing the test start passing again :D
I managed to do a high level pass of the changes, but notably did not look into depth on ActionPrivileges and FlattenedActionGroups.
src/test/java/org/opensearch/security/privileges/ActionPrivilegesTest.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/security/privileges/UserAttributes.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/security/privileges/ProtectedIndexAccessEvaluator.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/security/privileges/PrivilegesEvaluatorResponse.java
Outdated
Show resolved
Hide resolved
We're looking into this and will get back with an answer. |
The feedback we received was that the code can be used only for internal operations. Since JMH usage will be part of Open-source security, my understanding is that this is not approved. |
d128425
to
1714bd7
Compare
1714bd7
to
764b826
Compare
b5ff5c8
to
004df3b
Compare
@cwperks @peternied @DarshitChanpura @scrawfor99 Just FYI: I worked a bit on the micro benchmarking part of this issue. As JMH was out due to its license, I reviewed other frameworks. It is remarkable that in most cases the descriptions of the frameworks will say "rather use JMH instead of this framework". Anyway, I tried out https://github.com/noconnor/JUnitPerf because the idea of using JUnit infrastructure seemed to be nice. The big downside of JUnitPerf is that it does not work well together with parameterized JUnit tests. See here for an example: The high number of very similar methods is caused by the lack of parameter support - in the end we need to test quite a few different dimensions (like number of indices, number of roles, etc), on the same operation. As I was really keen on getting some broader result, I went on the "roll your own" path and quick threw together some naive micro benchmarking code. So, this is just a temporary thing, thus very messy, but it gives me some numbers. See here: So, I let run some tests and here are some preliminary results. Micro benchmark test resultsDisclaimerGenerally, the real world meaningfulness of micro benchmarks is limited. On a full real cluster, this can look totally different due to:
On the other hand, micro benchmarks make some tests so much easier. For micro benchmarking, a Full cluster benchmarks are also coming up, but these are still in the works. ScopeThe micro benchmarks were applied to the following code: Lines 501 to 512 in 004df3b
For comparison, we also applied the micro benchmarks to the following code on the old code base: Due to refactorings, the code looks different. However, what happens under the hood is effectively the same. Additionally some further code changes were necessary to make As we only look at the Tested dimensionsAction requestsWe tested privilege evaluation with three different actions:
Number of indices on clusterWe tested with these indices:
Different user configurations
ResultsThe raw result data can be found here: https://docs.google.com/spreadsheets/d/1Hd6pZFICTeplXIun3UpEplANAwQDE0jXbqLnnJz61AI/edit?usp=sharing In the shards below, dashed lines indicate the performance of the old privilege evaluation code on a particular combination of test dimensions. Solid lines with the same color indicate the performance of the new code with the same test dimensions. The x-axis represents the number of indices on the cluster, the y-axis represents the throughput in operations per second.
|
30fcc0f
to
41dd986
Compare
7adb281
to
504a157
Compare
@nibix thank you for the update. The results of bulk indexing in a cluster with a large number of indices is great to see and I like how the test cases isolate PrivilegeEvaluation for performance testing. Is there any work remaining before labeling this PR ready for review? |
Yes. Actually, I was also about to ping you regarding this.
Note: This leaves the DLS/FLS implementation unchanged at the moment. Thus, we will still have a part of the problematic performance characteristics. However, I would still get this merged and then add DLS/FLS support in a further PR to keep things (relatively) small. |
b30b530
to
78db68b
Compare
fca7058
to
fcc6183
Compare
Benchmark resultsAfter having shared the micro benchmarks before, I can now share results of benchmarks performed on an actual cluster. This brings the following benefits:
DisclaimerThe usual benchmarking disclaimer: Of course, these figures can only represent very specific scenarios. Other scenarios can look quite different, as there are so many variables which can be different. Yet, especially the "slow parts" of the benchmark results can give one an impression where real performance issues are. Test environmentWe ran the tests on an OpenSearch cluster hosted using Kubernetes. The host machine for the K8S nodes was a physical server with an AMD EPYC 7401P CPU. We ran an OpenSearch cluster with 3 nodes, each node had 64 GB of RAM, 32 GB of Java Heap, and a Tested dimensionsOperationsWe benchmarked the following operations:
Number of indices on clusterWe tested with these indices:
User configurationsWe tested with differently configured users to find out about the effects of complexity of roles, DLS rules, etc. The users were:
Test resultsA commented summary of the results follows in the upcoming sections. The raw test results can be also reviewed at https://docs.google.com/spreadsheets/d/16VRr9B2bPTyyS_T-IZobUG3C0uJxnXRxlbmsdH_MeCg/edit?usp=sharing Indexing throughputBulk size 10In this and the following charts, the dashed lines represent OpenSearch with the standard security plugin. The solid lines represent OpenSearch with the security plugin with the optimized privilege evaluation code. The blue line represents requests authenticated by the super admin certificate, which by-passes most of privilege evaluation. Thus, the blue line forms a kind of "hull curve", it can be seen as a rough theoretical maximum from a security plugin POV. The green lines represent users with full privileges. Yellow lines represent users with limited privileges - the darker the yellow, the more complex the role configuration. The benchmark results for the standard security plugin show a clear performance decline starting at about 300 indices on the cluster. This is caused by the privilege evaluation code resolving the index patterns in the roles configuration against all cluster indices for each request. At 1000 indices, we only get roughly 60% of the throughput that was observed for 10 indices. Additionally, it can be seen that growing role complexity has a clear effect on throughput. More complex roles show a significant lower throughput. On the other hand, the optimized security plugin shows a mostly linear throughput, which is independent of the number of indices. There is a slight decline starting from 3000 indices. The reasons for this are unknown so far. Yet, these values are still well above the standard plugin. Bulk size 1000Larger bulk sizes shift the place where the gap between standard and optimized plugin performance opens a bit to the right. Here we see a strong performance decline in the standard security plugin starting from 1000 indices. The optimized security plugin exhibits better performance in all cases, though. Additionally, the performance decline that was visible for the bulk: 10 case is not really visible here any more. The charts show a low throughput for the full privileges user both for the standard plugin and for the optimized plugin in the indices: 10 case. However, I would guess that this is some artifact of the used benchmarking process and is not a real performance characteristic. Search throughputSearch on single indexThe search throughput graphs introduce one further user: Users symbolized by the purple line are users which do have a document level access restriction implemented with a DLS query in one role. Like for the bulk indexing, the standard plugin shows a declining performance with increasing number of indices on the clusters. Also here, the complexity of the roles configuration has a very significant effect on the throughput. Especially the user with DLS exhibits heavy performance degradations. With 300 indices, the DLS user shows less than half the throughput of the full privileges user of the standard plugin. On the other hand, the optimized plugin shows mostly constant performance characteristics, independent of the number of indices. Also the DLS user does not show any significant degradation of performance. Search on 2% of indicesWhen an index pattern comes into play, both the standard plugin and the optimized plugin show a performance degradation with a growing number of indices. This is due to the necessary resolution of the index pattern against all indices on the cluster. The blue line - the super admin user - shows that there is quite a gap (about 20%, growing with higher number of indices) between the theoretically possible throughput and also the optimized plugin. This is likely due to the code in https://github.com/nibix/security/blob/main/src/main/java/org/opensearch/security/resolver/IndexResolverReplacer.java which still leaves some room for improvement. Still, the optimized plugin delivers a clearly higher throughput in all cases. Especially the DLS performance has strongly improved. For 1000 indices, we see a throughput of 1035 operations per second on the optimized plugin. The standard plugin just manages 53 operations per second with a service time of 300 ms per request. With 10000 indices, DLS gets virtually unusable on the standard plugin with just 0.6 operations per second and a service time of 16 seconds. The optimized plugin still delivers 99 operations per second. Search on 20% of indicesWith a search operation on 20% of indices, the index resolution performance gets so dominant that significant performance gains are no longer visible - except for DLS, which still shows very strong improvements. Using DLS with the standard plugin delivers a throughput of 7 operations per second already for just 1000 indices (service time 2.2 seconds). The optimized plugin still delivers a throughput of 113 operations per second (service time 176 ms). The blue line - the admin cert user - shows that there is still room for improvement, though. |
3df0a95
to
1964b4d
Compare
Side note: This is not the end of the story - there's still significant potential for performance improvements. I will file issues about that soon. |
Note: The patch coverage is only 84.5% which is a bit low for my taste: #4380 (comment) However, this low figure is mostly due to indentation changes in DLS/FLS classes due to newly necessary exception handlers. Thus, the low coverage of these classes leaks into this patch. I have a couple of ideas on how to improve coverage for these classes, but I guess that this is outside of the scope of this PR. |
I look forward to seeing the new issues filed. Great detective work @nibix ! |
src/main/java/org/opensearch/security/privileges/dlsfls/AbstractRuleBasedPrivileges.java
Show resolved
Hide resolved
Signed-off-by: Nils Bandener <nils.bandener@eliatra.com>
Signed-off-by: Nils Bandener <nils.bandener@eliatra.com>
Signed-off-by: Nils Bandener <nils.bandener@eliatra.com>
Signed-off-by: Nils Bandener <nils.bandener@eliatra.com>
…aluation Signed-off-by: Nils Bandener <nils.bandener@eliatra.com>
Signed-off-by: Nils Bandener <nils.bandener@eliatra.com>
Signed-off-by: Nils Bandener <nils.bandener@eliatra.com>
Signed-off-by: Nils Bandener <nils.bandener@eliatra.com>
Signed-off-by: Nils Bandener <nils.bandener@eliatra.com>
Signed-off-by: Nils Bandener <nils.bandener@eliatra.com>
f59eacf
to
1fef11b
Compare
src/integrationTest/java/org/opensearch/security/privileges/dlsfls/FlsDocumentFilterTest.java
Show resolved
Hide resolved
src/main/java/org/opensearch/security/privileges/dlsfls/DlsFlsProcessedConfig.java
Show resolved
Hide resolved
public Set<String> getMissingPrivileges() { | ||
return new HashSet<String>(missingPrivileges); | ||
return this.indexToActionCheckTable != null ? this.indexToActionCheckTable.getIncompleteColumns() : Collections.emptySet(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will always be empty if the checkTable is null? In what scenarios can the checktable be null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the last remaining case where checktable
is null:
security/src/main/java/org/opensearch/security/privileges/PrivilegesEvaluator.java
Lines 328 to 343 in 1fef11b
PrivilegesEvaluatorResponse presponse = new PrivilegesEvaluatorResponse(); | |
final String injectedRolesValidationString = threadContext.getTransient( | |
ConfigConstants.OPENDISTRO_SECURITY_INJECTED_ROLES_VALIDATION | |
); | |
if (injectedRolesValidationString != null) { | |
HashSet<String> injectedRolesValidationSet = new HashSet<>(Arrays.asList(injectedRolesValidationString.split(","))); | |
if (!mappedRoles.containsAll(injectedRolesValidationSet)) { | |
presponse.allowed = false; | |
presponse.missingSecurityRoles.addAll(injectedRolesValidationSet); | |
log.info("Roles {} are not mapped to the user {}", injectedRolesValidationSet, user); | |
return presponse; | |
} | |
mappedRoles = ImmutableSet.copyOf(injectedRolesValidationSet); | |
context.setMappedRoles(mappedRoles); | |
} |
Indeed, the concept of missing privileges in not applicable there.
Of course, I can also try to refactor this, but I am not sure if that's really in the scope of the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we be addressing this in a follow-up? Seems like a good piece to refactor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, cleaning up PrivilegeEvaluatorResponse is a good idea!
|
||
if (roleSetBuilder.getEstimatedByteSize() + indexMapBuilder | ||
.getEstimatedByteSize() > statefulIndexMaxHeapSize.getBytes()) { | ||
log.info( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
codecov is showing that these lines are not hit. Are there any tests that cover this scenario?
return response; | ||
} | ||
|
||
public static PrivilegesEvaluatorResponse insufficient( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the second arg is not needed here can it be removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in c474bf7
@@ -262,7 +325,7 @@ public PrivilegesEvaluatorResponse evaluate(PrivilegesEvaluationContext context) | |||
action0 = PutMappingAction.NAME; | |||
} | |||
|
|||
final PrivilegesEvaluatorResponse presponse = new PrivilegesEvaluatorResponse(); | |||
PrivilegesEvaluatorResponse presponse = new PrivilegesEvaluatorResponse(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this presponse object is primarily used for exiting early from privileges evaluation, like in the case of snapshot, system index, protected index and PIT privilege. Since its using one of the constructors directly, the error message for privilege evaluation will not include the missing privileges. Do you think it makes sense to return the missing privileges for such cases?
i.e.
if (snapshotRestoreEvaluator.evaluate(request, task, action0, clusterInfoHolder, presponse).isComplete()) {
if (!presponse.isAllowed()) {
return PrivilegesEvaluatorResponse.insufficient(action0, context);
} else {
return presponse;
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at SnapshotRestoreEvaluator, there are actually quite a few different error scenarios:
- The operation is not allowed at all for normal users
- The operation includes protected indices
The actual presence of the permission is actually not checked in the SnapshotRestoreEvaluator, this is checked below in the "normal" privilege evaluation support.
Thus, it would be misleading to report the restore action as missing privilege, as adding that privilege won't fix the error.
However, one could extend SnapshotRestoreEvaluator to include the reason of the error in the reason
attribute of presponse. That is however IMHO outside of the scope of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @nibix for improving this feature. I took initial pass and have left comments, most of which are clarification questions.
src/main/java/org/opensearch/security/OpenSearchSecurityPlugin.java
Outdated
Show resolved
Hide resolved
src/integrationTest/java/org/opensearch/security/privileges/ActionPrivilegesTest.java
Show resolved
Hide resolved
src/integrationTest/java/org/opensearch/security/privileges/ActionPrivilegesTest.java
Show resolved
Hide resolved
src/integrationTest/java/org/opensearch/security/privileges/ActionPrivilegesTest.java
Show resolved
Hide resolved
src/integrationTest/java/org/opensearch/security/privileges/ActionPrivilegesTest.java
Show resolved
Hide resolved
src/main/java/org/opensearch/security/securityconf/impl/SecurityDynamicConfiguration.java
Show resolved
Hide resolved
private IndexPattern(WildcardMatcher staticPattern, ImmutableList<String> patternTemplates, ImmutableList<String> dateMathExpressions) { | ||
this.staticPattern = staticPattern; | ||
this.patternTemplates = patternTemplates; | ||
this.dateMathExpressions = dateMathExpressions; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why require a separate variable for these? Is it for daily-rotating index or something similar?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a feature of the index_pattern
attribute in the roles configuration to support also date math expressions - it has been there from the beginning.
I guess you could use it for defining that users are just allowed to see the data from today's index, but not from indices in the past. Yet, I find the use of date math for this purpose questionable, as it lacks expressiveness.
So, this is just for backwards compat.
See also the comment here:
security/src/main/java/org/opensearch/security/privileges/IndexPattern.java
Lines 87 to 88 in 1fef11b
// Note: The use of date math expressions in privileges is a bit odd, as it only provides a very limited | |
// solution for the potential user case. A different approach might be nice. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL. Hmm, I never have used it so didn't understand the comment fully.
src/main/java/org/opensearch/security/privileges/PrivilegesEvaluator.java
Show resolved
Hide resolved
public Set<String> getMissingPrivileges() { | ||
return new HashSet<String>(missingPrivileges); | ||
return this.indexToActionCheckTable != null ? this.indexToActionCheckTable.getIncompleteColumns() : Collections.emptySet(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we be addressing this in a follow-up? Seems like a good piece to refactor
Signed-off-by: Nils Bandener <nils.bandener@eliatra.com>
Signed-off-by: Nils Bandener <nils.bandener@eliatra.com>
Signed-off-by: Nils Bandener <nils.bandener@eliatra.com>
Signed-off-by: Nils Bandener <nils.bandener@eliatra.com>
Description
This implements the optimized privilege evaluation as described in #3870.
This introduces de-normalized data structures that are optimized for the checks that need to be done during privilege evaluation. Additionally, certain objects (like DLS queries) are prepared ahead of time, as early as possible in order to minimize the overhead during actual privilege evaluation.
This is a big change set - in order to facilitate the review, I have split it into three major commits:
The code is extensively commented - I hope that will help during review.
Performance tests indicate that the OpenSearch security layer adds a noticeable overhead to the indexing throughput of an OpenSearch cluster. The overhead may vary depending on the number of indices, the use of aliases, the number of roles and the size of the user object. The goal of these changes is to improve privilege evaluation performance and to make it less dependent on the number of indices, etc.
No significant behavioral changes in the "happy case", when privileges are present.
The undocumented config option
config.dynamic.multi_rolespan_enabled
is no longer evaluated. The code now behaves like it is always set totrue
- that is the former default. See #4495 for details.Some slight changes are present in error cases:
Issues Resolved
Testing
SecurityBackwardsCompatibilityIT
(extended in Fixed bulk index requests in BWC tests and hardened assertions #4817 )Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.